We work with information about the location and detection time of two types of mosquitoes. Both Aedes aegypti and Aedes albopictus mosquitoes may spread viruses but Aedes aegypti are more likely to spread these viruses (and therefore are more dangerous).
At first we create a MapBox interface in Plotly to create two dot maps (for years 2004 and 2013) that show the distribution of the two types of mosquitos in the world (use color to distinguish between mosquitos).
In 2004:
We can see high density of Aedes aegypti in:
Brazilโ>areas: Fortaleza , Sao Paulo
Venezuela โ> area: Caracas
Mexico โ> area: Veracruz
Indonesia โ> area: Jacarta
United State of America โ> area: Houston
Trinidad
And also we can see high density of Aedes albopictus in:
Unites States of America โ> areas: Missisipi, Oklahama
Taiwan โ> Given that all the counties and citiesโ names were written
in Chinese, we were unable to discern between them.
In 2013: We can see high density of Aedes aegypti in:
Brazil โ> Most areas of the brazil affected by this virus.
And also we can see high density of Aedes albopictus in:
Taiwan โ> Given that all the counties and citiesโ names were written
in Chinese, we were unable to discern between them.
More countries were impacted by these two mosquito species in 2004,
but as of 2013, we can see that only Brazil and Taiwan are affected,
despite the fact that their densities are higher in these two
nations.
Overplotting: In some high-density areas, overplotting can occur.
And it makes it difficult to distinguish individual data (such as the
names of areas which are affected) and can lead to a loss of
information. For instance, Tiwan in 2004 and 2013, also Brazil in
2013.
Data Density: Data points may be numerous in some areas and
sparse in others, depending on the dataset and map scale. The viewerโs
perception of the areas with concentrated mosquito populations may be
impacted by this non-uniform distribution. For instance, in 2004, we
could not distinguish Taiwan as a high-density affected area, but after
zooming a lot, we understood that this area was seriously
affected.
In this task, we compute Z as the numbers of mosquitos per country
detected during all study period. Use plot_geo() function to create a
choropleth map that shows Z values. This map should have an
Equirectangular projection.
The Equirectangular projection, also known as the Plate Carrรฉe
projection, is one of the simplest and most commonly used map
projections in cartography and geographic information systems (GIS). It
maps meridians to equally spaced vertical straight lines, and circles of
latitude to evenly spread horizontal straight lines.ย
The Equirectangular projection is not ideal for navigation or
large-scale mapping because of the significant distortion of shapes and
sizes as you move away from the equator. Other map projections, such as
the Mercator projection or Lambert Conformal Conic projection, are
better suited for specific applications where preservation of area or
distance is important.
Because the total number of mosquitos for some countries are very
small, and for a few countries are large. Therefore, most of the
countries show in the light red and this plot indicates so little
information.
In addition, given that Taiwan, as a country, has been most affected by
these mosquitoes, is tiny compared to a global scale space (Size
aesthetics used), also we have overplotting in this area, at first
glance and without continuous zooming, this level of density is not
visible at all.
By comparing the values of Z in Tiwan and Brazil, we understood that Z
is 24837 for Tiwan and 8501 for Brazil, although we only see Brazil as a
high-density area that is colored in dark red.
Now, we should create the same kind of maps as in step 2 but
use:
This map is more informative than the previous one, because color
mapping is better and we can distinguish areas which are infected more
easier. Although larger regions with the same color looks dominating.
And it is hard to identify small regions that are seriously infected
(For instance: Taiwan).
This map shows unusual regions better, also here we can find clusters of
regions that are similar easier.
In Equirectangular projection the latitude and longitude lines are
evenly spaced, making it suitable for choropleth maps where data is
distributed evenly across latitudes and longitudes. It is
straightforward and easy to understand, making it accessible for a wide
range of users.
In these types of projection preserves shape and angles along the
equator, which can be beneficial when working with data concentrated in
that region (Minimal Distortion near Equator). But as you move away from
the equator, distortion becomes significant, causing areas to appear
stretched vertically. This makes it less suitable for displaying data
near the poles.
Choropleth maps with data concentrated at high latitudes may not be
effectively represented using the equirectangular projection.
Conic equal area is less distortion in limited area. It minimizes
distortion within a particular region of interest, making it suitable
for maps focused on a specific part of the world. Conic equal area
projections are best suited for specific regions; they may not be
appropriate for displaying global data or data distributed over a wide
range of latitudes.
Conic projections can be more complex to understand and work with
compared to the equirectangular projection, requiring more advanced
cartographic skills.
For instance, we can see Taiwan area in Conic equal area obviously,
although it was not seen Equirectangular projection easily. Therefore,
it could affect our results and decisions.
In this part, in order to resolve problems detected in step 1, use
data from 2013 only for Brazil and following below steps:
## # A tibble: 6 ร 5
## # Groups: X1 [5]
## X1 Y1 mean_X mean_Y N
## <fct> <fct> <dbl> <dbl> <int>
## 1 [-72.8,-72.4] (-8.21,-7.84] -72.8 -7.96 1
## 2 (-71.2,-70.8] (-8.94,-8.57] -70.8 -8.9 1
## 3 (-70.4,-70] (-10.8,-10.4] -70.0 -10.7 1
## 4 (-70,-69.6] (-4.14,-3.77] -69.7 -4.03 1
## 5 (-69.6,-69.2] (-10.8,-10.4] -69.2 -10.7 1
## 6 (-69.6,-69.2] (-10.1,-9.68] -69.4 -9.77 1
The most infected regions in Brazil in 2013 by mosquitos
include:
Caicara: N=16
Guarabira: N=15
Brejinho: N=14
Sao Paulo: N=13
Sitio Cruz de Alma: N=13
Mogeiro:13
By discretizing the geographic space into smaller regions we
aggregate the data within each cell, making it easier to analyze
patterns and variations in mosquito distribution. In addition,
discretization allows us to identify patterns in mosquito distribution
at a regional level. We can see which grid cells have higher or lower
mosquito populations. This information can be valuable for understanding
where mosquitoes are concentrated and where they are less
prevalent.
Also discretization helps in analyzing the distribution of mosquitoes
by simplifying spatial data, revealing regional patterns, enabling
statistical analysis, and facilitating effective visual communication of
findings. It provides a structured approach to studying mosquito
populations in different regions, helping researchers and
decision-makers better understand and manage mosquito-related
issues.
In summary, with such discretization, we can see the distribution of
mosquitoes quickly. And we can identify regions that are most infected
by mosquitoes more easily and analyze their distribution more
effectively.
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
hr {
border: 1px solid black; /* Adjust the thickness and color as needed */
}
#Visualization of mosquitoโs populations
data <- read.csv("aegypti_albopictus.csv")
library(dplyr)
library(ggplot2)
#library(MASS)
library(plotly)
# Part 1
#library(dplyr)
#mapbox
#Create dot map for year=2004
Sys.setenv('MAPBOX_TOKEN' = "pk.eyJ1IjoibW9ubW81MzgiLCJhIjoiY2xtbnZ3MDRsMHkwNjJrcG5jaW1raXVybyJ9.HxbeuV14U8LSJL_nG5fGiw")
data1 <- data%>%filter(YEAR=="2004")
p<-plot_mapbox(data1)%>%
add_trace(type="scattermapbox",lat=~Y, lon=~X,
color=~VECTOR, text=~COUNTRY)
p <- p %>% layout(
title = "Mosquito Populations in 2004",
mapbox=list(
style="open-street-map")
)
p
#Create dot map for year=2013
Sys.setenv('MAPBOX_TOKEN' = "pk.eyJ1IjoibW9ubW81MzgiLCJhIjoiY2xtbnZ3MDRsMHkwNjJrcG5jaW1raXVybyJ9.HxbeuV14U8LSJL_nG5fGiw")
data2 <- data%>%filter(YEAR=="2013")
p<-plot_mapbox(data2)%>%
add_trace(type="scattermapbox",lat=~Y, lon=~X,
color=~VECTOR, text=~COUNTRY)
p <- p %>% layout(
title = "Mosquito Populations in 2013",
mapbox=list(
style="open-street-map")
)
p
# Part 2
library(dplyr)
Z <- data%>%
group_by(COUNTRY)%>%
#n() function specifically counts the number of rows (observations) within each group
summarise(Total_num_Mosquitos = n())
# Merge the total mosquito counts (Z) with the original data using COUNTRY as the key
merged_data <- merge(data, Z, by = "COUNTRY")
# Create the choropleth map
library(plotly)
# WORLD MAP
g <- list(
projection = list(type = 'equirectangular') # Use equirectangular projection
)
p <- plot_geo(merged_data) %>%
add_trace(
z = ~Total_num_Mosquitos, color = ~Total_num_Mosquitos, colors = 'Reds',
text = ~COUNTRY, locations = ~COUNTRY_ID
) %>%
layout(
title = "Numbers of Mosquitos per Country",
geo = g
)
p
#Create the choropleth map: Equirectangular projection with choropleth color log(๐)
library(plotly)
# WORLD MAP
g <- list(
projection = list(type = 'equirectangular') # Use equirectangular projection
)
p <- plot_geo(merged_data) %>%
add_trace(
z = ~log(Total_num_Mosquitos), color = ~log(Total_num_Mosquitos), colors = 'Reds',
text = ~COUNTRY, locations = ~COUNTRY_ID
) %>%
layout(
title = "Numbers of Mosquitos per Country_Equirectangular projection with choropleth color log (๐)",
geo = g
)
p
#Create the choropleth map: Conic equal area projection with choropleth color log(๐)
library(plotly)
# WORLD MAP
g <- list(
projection = list(type = 'conic equal area') # Use conic equal area projection
)
p <- plot_geo(merged_data) %>%
add_trace(
z = ~log(Total_num_Mosquitos), color = ~log(Total_num_Mosquitos), colors = 'Reds',
text = ~COUNTRY, locations = ~COUNTRY_ID
) %>%
layout(
title = "Numbers of Mosquitos per Country_Conic equal area projection with choropleth color log (๐)",
geo = g
)
p
#Filtering data from 2013 only for Brazil
data3 <- data%>%filter(YEAR=="2013" & COUNTRY=="Brazil")
# Create variable X1 by cutting X into 100 piecies
data3$X1 <- cut_interval(data3$X, n = 100)
#Create variable Y1 by cutting Y into 100 piecies
data3$Y1 <- cut_interval(data3$Y, n = 100)
# Part 4-c
#Compute mean values of X and Y per group (X1,Y1) and the amount of observations
#N per group (X1,Y1)
mean_X_Y <- data3%>%
group_by(X1,Y1)%>%
summarize(mean_X = mean(X),
mean_Y = mean(Y),
N=n())
head(mean_X_Y)
#Visualize mean X,Y and N by using MapBox
Sys.setenv('MAPBOX_TOKEN' = "pk.eyJ1IjoibW9ubW81MzgiLCJhIjoiY2xtbnZ3MDRsMHkwNjJrcG5jaW1raXVybyJ9.HxbeuV14U8LSJL_nG5fGiw")
p <- plot_mapbox(mean_X_Y)%>%add_trace(type="scattermapbox", lat=~mean_X_Y$mean_Y,
lon=~mean_X_Y$mean_X,
mode = "markers",
text=~paste("Mean_X =", mean_X_Y$mean_X,
"Mean_Y = ", mean_X_Y$mean_Y,
"N = ", mean_X_Y$N),
hoverinfo = "text",
marker=list(size = mean_X_Y$N))
p <- p %>% layout(
title = "mean X,Y and N",
mapbox=list(
style="open-street-map")
)
p
# p <- plot_mapbox(x=mean_X_Y$mean_X, Y= mean_X_Y$mean_Y, color=~mean_X_Y$N)
# p